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ABSTRACT 


We assess the performance of generic text summarization algorithms applied to films and documen¬ 
taries, using extracts from news articles produced by reference models of extractive summarization. 
We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. 
Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot 
summaries, and synopses. We show that the best performing algorithms are LSA, for news articles 
and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and 
documentaries, their relative behavior is in accordance with that obtained for news articles. 

©2016 Elsevier Ltd. All rights reserved. 


1. Introduction 

Input media for automatic summarization has varied from 
text lUSl 13 to speech EDimiMi and video HI, but the ap¬ 
plication domain has been, in general, restricted to informa¬ 
tive sources: news O |30l [33l [TTl, meetings (261 El, or lec¬ 
tures Q. Nevertheless, application areas within the entertain¬ 
ment industry are gaining attention: e.g. summarization of lit¬ 
erary short stories (121, music summarization ifSTl . summariza¬ 
tion of books (24l, or inclusion of character analyses in movie 
summaries (36l. We follow this direction, creating extractive, 
text-driven video summaries for films and documentaries. 

Documentaries started as cinematic portrayals of reality ca. 
Today, they continue to portray historical events, argumenta¬ 
tion, and research. They are commonly understood as capturing 
reality and therefore, seen as inherently non-fictional. Eilms, in 
contrast, are usually associated with fiction. However, films and 
documentaries do not fundamentally differ: many of the strate¬ 
gies and narrative structures employed in films are also used in 
documentaries ED. 

In the context of our work, films (fictional) tell stories based 
on fictive events, whereas documentaries (non-fictional) ad¬ 
dress, mostly, scientific subjects. We study the parallelism be¬ 
tween the information carried in subtitles and scripts of both 
films and documentaries. Extractive summarization methods 
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have been extensively explored for news documents flSl [22l 
ITtI 123 |30l |23l. Our main goal is to understand the qual¬ 
ity of automatic summaries, produced for films and documen¬ 
taries, using the well-known behavior of news articles as ref¬ 
erence. Generated summaries are evaluated against manual 
abstracts using ROUGE metrics, which correlate with human 
judgements (TSlfTTI . 

This article is organized as follows: Section presents the 
summarization algorithms; Section presents the collected 
datasets; Section [^presents the evaluation setup; Sectionj^dis- 
cusses our results; Section presents conclusions and direc¬ 
tions for future work. 


2. Generic Summarization 

Six text-based summarization approaches were used to sum¬ 
marize newspaper articles, subtitles, and scripts. They are de¬ 
scribed in the following sections. 


2. 1. Maximal Marginal Relevance (MMR) 

MMR is a query-based summarization method (4l- It iter¬ 
atively selects sentences via Equation ((5 is a query; Simi 
and Sim 2 are similarity metrics; Si and Sj are non-selected and 
previously selected sentences, respectively). A balances rele¬ 
vance and novelty. MMR can generate generic summaries by 
considering the input sentences centroid as a query (^1^ . 


arg max 

Si 


ASimi (Si^Q) — {1 — X) maxSim 2 (5'i, Sj) 

Si 


( 1 ) 
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2.2. LexRank 

LexRank |[6| is a centrality-based method based on Google’s 
PageRank fS]. A graph is built using sentences, represented by 
TF-IDF vectors, as vertexes. Edges are created when the cosine 
similarity exceeds a threshold. Equation]^ is computed at each 
vertex until the error rate between two successive iterations is 
lower than a certain value. In this equation, d is a damping 
factor to ensure the method’s convergence, N is the number of 
vertexes, and S (Vi) is the score of the ith vertex. 




( 1 -^) 

N 


+dx 

Vjeadm] 


Sim{Vi,Vj) 

^Vkead}[Vj] Sim (V^-, 14) 

( 2 ) 


2. 3. Latent Semantic Analysis (LSA) 

LSA infers contextual usage of text based on word co¬ 
occurrence (131 [I3I • Important topics are determined without 
the need for external lexical resources (91 : each word’s oc¬ 
currence context provides information concerning its meaning, 
producing relations between words and sentences that correlate 
with the way humans make associations. Singular Value De¬ 
composition (S VD) is applied to each document, represented by 
a t X n term-by-sentences matrix A, resulting in its decomposi¬ 
tion U'EV^. Summarization consists of choosing the k highest 
singular values from H, giving H/.. U and are reduced to 
Uk and respectively, approximating Ahy A^ = Uk^ky^• 
The most important sentences are selected from . 

2.4. Support Sets 

Documents are typically composed by a mixture of subjects, 
involving a main and various minor themes. Support sets are 
defined based on this observation 1^ . Important content is de¬ 
termined by creating a support set for each passage, by compar¬ 
ing it with all others. The most semantically-related passages, 
determined via geometric proximity, are included in the support 
set. Summaries are composed by selecting the most relevant 
passages, i.e., the ones present in the largest number of support 
sets. For a segmented information source I = pi,P 2 , • • • ,PAr, 
support sets Si for each passage pi are defined by Equation 
where Sim is a similarity function, and is a threshold. The 
most important passages are selected by Equation]^ 

Si = {s e I : Sim{s,Pi) > Ci A s ^ Pi} (3) 

arg max \{Si : s € Si}\ (4) 

seur^^^Si 


2.6. Graph Random-walk with Absorbing StateS that HOPs 
among PEaks for Ranking (GRASSHOPPER) 

GRASSHOPPER ||40l is a re-ranking algorithm that maxi¬ 
mizes diversity and minimizes redundancy. It takes a weighted 
graph W (n X n: n vertexes representing sentences; weights 
are defined by a similarity measure), a probability distribu¬ 
tion r (representing a prior ranking), and A G [0,1], that 
balances the relative importance of W and r. If there is no 
prior ranking, a uniform distribution can be used. Sentences 
are ranked by applying the teleporting random walks method 
in an absorbing Markov chain, based on the n x n transi¬ 
tion matrix P (calculated by normalizing the rows of W), i.e., 
P = XP + (1 — A) Ir^. The first sentence to be scored is 
the one with the highest stationary probability arg tt^ 

according to the stationary distribution of P: tt = P^tt. Al¬ 
ready selected sentences may never be visited again, by defining 
Pgg = 1 and Pgi = 0, Vi 7 ^ g. The expected number of visits 
is given by matrix N = {I — Q)~^ (where Nij is the expected 
number of visits to the sentence j, if the random walker began 
at sentence i). We obtain the average of all possible starting 
sentences to get the expected number of visits to the jth sen¬ 
tence, Vj. The sentence to be selected is the one that satisfies 
arg 

3. Datasets 

We use three datasets: newspaper articles (baseline data), 
films, and documentaries. Film data consists of subtitles and 
scripts, containing scene descriptions and dialog. Documentary 
data consists of subtitles containing mostly monologue. Refer¬ 
ence data consists of manual abstracts (for newspaper articles), 
plot summaries (for films and documentaries), and synopses 
(for films). Plot summaries are concise descriptions, sufficient 
for the reader to get a sense of what happens in the film or docu¬ 
mentary. Synopses are much longer and may contain important 
details concerning the turn of events in the story. All datasets 
were normalized by removing punctuation inside sentences and 
timestamps from subtitles. 

3.1. Newspaper Articles 

TeMario (iSl is composed by 100 newspaper articles in 
Brazilian Portuguese (Table [T]), covering domains such as 
“world”, “politics”, and “foreign affairs”. Each article has a 
human-made reference summary (abstract). 


2.5. Key Phrase-based Centrality (KP-Centrality) 

Ribeiro et al. (3^ proposed an extension of the centrality 
algorithm described in Section [2^ which uses a two-stage im¬ 
portant passage retrieval method. The first stage consists of 
a feature-rich supervised key phrase extraction step, using the 
MAUI toolkit with additional semantic features: the detection 
of rhetorical signals, the number of Named Entities, Part-Of- 
Speech (POS) tags, and 4 n-gram domain model probabilities 
Golllll. The second stage consists of the extraction of the most 
important passages, where key phrases are considered regular 
passages. 


Table 1: TeMario corpus properties. 




AVG 

MIN 

MAX 

#Sentences 

News Story 
Summary 

29 

9 

12 

5 

68 

18 

#Words 

News Story 

608 

421 

1315 

Summary 

192 

120 

345 


3.2. Eilms 

We collected 100 films, with an average of 4 plot summaries 
(minimum of 1, maximum of 7) and 1 plot synopsis per film 
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(Table [^. Table presents the properties of the subtitles, 
scripts, and the concatenation of both. Not all the information 
present in the scripts was used: dialogs were removed in order 
to make them more similar to plot summaries. 


Table 2: Properties of plot summaries and synopses. 




AVG 

MIN 

MAX 

#Sentences 

Plot Summaries 

5 

1 

29 

Plot Synopses 

89 

6 

399 

#Words 

Plot Summaries 

107 

14 

600 

Plot Synopses 

1677 

221 

7110 


Table 3: Properties of subtitles and scripts. 




AVG 

MIN 

MAX 


Subtitles 

1573 

309 

4065 

#Sentences 

Script 

1367 

281 

3720 


Script -1- Subtitles 

2787 

1167 

5388 


Subtitles 

10460 

1592 

27800 

#Words 

Script 

14560 

3493 

34700 


Script -1- Subtitles 

24640 

11690 

47140 


3.3. Documentaries 


We collected 98 documentaries. Table presents the prop¬ 
erties of their subtitles: note that the number of sentences is 
smaller than in films, influencing ROUGE (recall-based) scores. 


Table 4: Properties of documentaries subtitles. 



AVG 

MIN 

MAX 

#Sentences 

340 

212 

656 

#Words 

5864 

3961 

10490 


We collected 223 manual plot summaries and divided them 
into four classes (Table |^: 143 “Informative”, 63 “Interrog¬ 
ative”, 9 “Inviting”, and 8 “Challenge”. “Informative” sum¬ 
maries contain factual information about the program; “Inter¬ 
rogative” summaries contain questions that arouse viewer cu¬ 
riosity, e.g. “What is the meaning of life?”; “Inviting” are in¬ 
vitations, e.g. “Got time for a 24 year vacation?”; and, “Chal¬ 
lenge” entice viewers on a personal basis, e.g. “are you ready 
for...?”. We chose “Informative” summaries due to their resem¬ 
blance to the sentences extracted by the summarization algo¬ 
rithms. On average, there are 2 informative plot summaries per 
documentary (minimum of 1, maximum of 3). 


Table 5: Properties of the documentary plot summaries. 




AVG 

MIN 

MAX 


Informative 

4 

1 

18 

#Sentences 

Interrogative 

4 

1 

19 


Inviting 

6 

2 

11 


Challenge 

5 

2 

9 


Informative 

82 

26 

384 

#Word<; 

Interrogative 

103 

40 

377 

Tr yy uiU-o 

Inviting 

146 

63 

234 


Challenge 

104 

59 

192 


4. Experimental Setup 


For news articles, summaries were generated with the aver¬ 
age size of the manual abstracts (?^ 31% of their size). 

For each film, two summaries were generated, by selecting a 
number of sentences equal to (i) the average length of its man¬ 
ual plot summaries, and (ii) the length of its synopsis. In con¬ 
trast with news articles and documentaries, three types of input 
were considered: script, subtitles, scripUsubtitles. 

For each documentary, a summary was generated with the 
same average number of sentences of its manual plot summaries 
(« 1% of the documentary’s size). 

Content quality of summaries is based on word overlap (as 
defined by ROUGE) between generated summaries and their 
references. ROUGE-N computes the fraction of selected words 
that are correctly identified by the summarization algorithms 
(cf. Equation]^ RS are reference summaries, gram^ is the n- 
gram length, and countmatch(gi‘am^) is the maximum number 
of n-grams of a candidate summary that co-occur with a set 
of reference summaries). ROUGE-SU measures the overlap of 
skip-bigrams (any pair of words in their sentence order, with 
the addition of unigrams as counting unit). ROUGE-SU4 limits 
the maximum gap length of skip-bigrams to 4. 


ROUGE-N = 


EfigRs Egram^gg count^atch(gram„) 
Esgrs Egram„es count(gram„) 


(5) 


5. Results and Discussion 

Subtitles and scripts were evaluated against manual plot sum¬ 
maries and synopses to define an optimal performance ref¬ 
erence. The following sections present averaged ROUGE-1, 
ROUGE-2, and ROUGE-SU4 scores (henceforth R-1, R-2, and 
R-SU4), and the performance of each summarization algorithm, 
as a ratio between the score of the generated summaries and 
this reference (relative performance). Several parametrizations 
of the algorithms were used (we present only the best results). 
Concerning MMR, we found that the best A corresponds to 
a higher average number of words per summary. Concerning 
GRASSHOPPER, we used the uniform distribution as prior. 

5.1. Newspaper Articles (TeMdrio) 

Table presents the scores for each summarization algo¬ 
rithm. LSA achieved the best scores for R-1, R-2, and R-SU4. 
Figureshows the relative performance results. 
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Table 6: ROUGE scores for generated summaries and original 
documents against manual references. For MMR, A = 0.50; 
Support Sets used Manhattan distance and Support Set Cardi¬ 
nality = 2; KP-Centrality used 10 key phrases. 



R-1 

R-2 

R-SU4 

AVG #Words 

MMR 

0.43 

0.15 

0.18 

195 

Support Sets 

0.52 

0.19 

0.23 

254 

KP 

0.54 

0.20 

0.24 

268 

FSA 

0.56 

0.20 

0.24 

297 

GRASSH. 

0.54 

0.19 

0.23 

270 

FexRank 

0.55 

0.20 

0.24 

277 

Original Docs 

0.75 

0.34 

0.38 

608 


80% 



MMR Support Sets KP LSA GRASSH. LexRank 

■ ROUGE-1 ■ ROUGE-2 ■ ROUGE-SU4 


Fig. 1: Relative performance for news articles. For MMR, A = 
0.50; Support Sets used Manhattan distance and Support Set 
Cardinality = 2; KP-Centrality used 10 key phrases. 


5.2. Films 


Table [7] presents the scores for the film data combina¬ 
tions against plot summaries. Overall, Support Sets, FSA, 
and FexRank, capture the most relevant sentences for plot 
summaries. It would be expected, for algorithms such as 
GRASSHOPPER and MMR, that maximize diversity, to per¬ 
form well in this context, because plot summaries are relatively 
small and focus on the more important aspects of the film, ide¬ 
ally, without redundant content. However, our results show oth¬ 
erwise. For scripts, FSA and FexRank are the best approaches 
in terms of R-1 and R-SU4. 


Table 7: ROUGE scores for generated summaries for subti¬ 
tles, scripts, and scripts concatenated with subtitles, against plot 
summaries. For MMR, A = 0.50; Support Sets used the co¬ 
sine distance and threshold = 50%; KP-Centrality used 50 key 
phrases. 




R-1 

R-2 

R-SU4 

AVG #Words 


Subtitles 

0.07 

0.01 

0.02 

52 

MMR 

Script 

0.14 

0.01 

0.03 

53 


Script + Subtitles 

0.12 

0.01 

0.03 

71 


Subtitles 

0.23 

0.02 

0.06 

150 

Support Sets Script 

0.25 

0.02 

0.07 

133 


Script + Subtitles 

0.29 

0.03 

0.09 

195 


Subtitles 

0.22 

0.02 

0.06 

144 

KP 

Script 

0.24 

0.02 

0.07 

123 


Script + Subtitles 

0.28 

0.02 

0.08 

184 


Subtitles 

0.22 

0.02 

0.06 

167 

LSA 

Script 

0.28 

0.03 

0.08 

190 


Script + Subtitles 

0.28 

0.03 

0.08 

219 


Subtitles 

0.17 

0.01 

0.04 

135 

GRASSH. 

Script 

0.21 

0.02 

0.06 

121 


Script + Subtitles 

0.22 

0.02 

0.06 

118 


Subtitles 

0.24 

0.02 

0.06 

177 

LexRank 

Script 

0.29 

0.02 

0.09 

168 


Script + Subtitles 

0.30 

0.02 

0.08 

217 


Subtitles 

0.77 

0.21 

0.34 

10460 

Original 

Docs 

Script 

0.74 

0.23 

0.36 

14560 


Script + Subtitles 

0.83 

0.31 

0.43 

24640 


Table 8: ROUGE scores for generated summaries for subtitles, 
scripts, and scripts-Fsubtitles, against plot synopses. For MMR, 
A = 0.50; Support Sets used the cosine distance and threshold 
= 50%; KP-Centrality used 50 key phrases. 




R-1 

R-2 

R-SU4 

AVG #Words 


Subtitles 

0.08 

0.01 

0.02 

435 

MMR 

Script 

0.16 

0.03 

0.06 

745 


Script -1- Subtitles 

0.11 

0.01 

0.03 

498 

c 

Subtitles 

0.25 

0.04 

0.08 

1033 

oupport 

Sets 

Script 

0.37 

0.07 

0.15 

1536 


Script -1- Subtitles 

0.42 

0.08 

0.16 

1736 


Subtitles 

0.24 

0.04 

0.08 

952 

KP 

Script 

0.36 

0.07 

0.14 

1419 


Script -1- Subtitles 

0.40 

0.08 

0.16 

1580 


Subtitles 

0.31 

0.06 

0.11 

1303 

LSA 

Script 

0.42 

0.09 

0.17 

1934 


Script -1- Subtitles 

0.45 

0.10 

0.18 

2065 


Subtitles 

0.34 

0.06 

0.12 

1553 

GRASSH. Script 

0.44 

0.09 

0.18 

1946 


Script -1- Subtitles 

0.47 

0.10 

0.19 

1768 


Subtitles 

0.34 

0.06 

0.12 

1585 

LexRank 

Script 

0.45 

0.10 

0.18 

1975 


Script + Subtitles 

0.48 

0.10 

0.19 

2222 

■ ■ 1 

Subtitles 

0.70 

0.18 

0.30 

10460 

Original 

Docs 

Script 

0.73 

0.24 

0.37 

14560 


Script + Subtitles 

0.83 

0.32 

0.44 

24640 


Table presents the scores for the film data combinations 
against plot synopses. The size of synopses is very different 
from that of plot summaries. Although synopses also focus on 
the major events of the story, their larger size allows for a more 
refined description of film events. Additionally, because sum¬ 
maries are created with the same number of sentences of the 
corresponding synopsis, higher scores are expected. From all 
algorithms, FexRank clearly stands out with the highest scores 
for all metrics (except for R-SU4, for scripts). 
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The script+subtitles combination was used in order to de¬ 
termine whether the inclusion of redundant content would im¬ 
prove the scores, over the separate use of scripts or subtitles. 
However, in all cases (Figure]^, script-Fsubtitles leads to worse 
scores, when compared to scripts alone. The same behavior 
is observed when using subtitles except for Support Sets-based 
methods (Support Sets and KP-Centrality). For plot synopses, 
the best scores are achieved by LexRank and GRASSHOP¬ 
PER, while, for plot summaries, the best scores are achieved by 
LexRank and LSA. By inspection of the summaries produced 
by each algorithm, we observed that MMR chooses sentences 
with fewer words in comparison with all other algorithms (nor¬ 
mally, leading to lower scores). Overall, the algorithms behave 
similarly for both subtitles and scripts. 

5.3. Documentaries 

From all algorithms (Table [^, LSA achieved the best results 
for R-1 and R-SU4, along with LexRank for R-1. KP-Centrality 
achieved the best results for R-2. It is important to notice that 
LSA also produces the summaries with the highest word count 
(favoring recall). Figure shows the relative performance re¬ 
sults: LSA outperformed all other algorithms for R-1 and R- 
SU4, and KP-Centrality was the best for R-2; Support Sets and 
KP-Centrality performed closely to LSA for R-SU4; the best 
MMR results were consistently worse across all metrics (MMR 
summaries have the lowest word count). 

Table 9: ROUGE scores for generated summaries and origi¬ 
nal subtitles against human-made plot summaries. For MMR, 
A = 0.75; Support Sets used the cosine distance and threshold 
= 50%; KP-Centrality used 50 key phrases. 



R-l 

R-2 

R-SU4 

AVG #Words 

MMR 

0.17 

0.01 

0.04 

78 

Support Sets 

0.37 

0.06 

0.12 

158 

KP 

0.37 

0.07 

0.12 

149 

LSA 

0.38 

0.06 

0.13 

199 

GRASSH. 

0.31 

0.04 

0.10 

150 

LexRank 

0.38 

0.05 

0.12 

183 

Original Docs 

0.83 

0.37 

0.46 

5864 


50% 



MMR Support Sets KP LSA GRASSH. LexRank 

■ ROUGE-1 ■ROUGE-2 ■ROUGE-SU4 


Fig. 2: Relative performance for documentaries against plot 
summaries. For MMR, A = 0.75; Support Sets used cosine dis¬ 
tance and threshold=50%; KP-Centrality used 50 key phrases. 


5.4. Discussion 

News articles intend to answer basic questions about a partic¬ 
ular event: who, what, when, where, why, and often, how. Their 
structure is sometimes referred to as “inverted pyramid”, where 
the most essential information comes first. Typically, the first 
sentences provide a good overview of the entire article and are 
more likely to be chosen when composing the final summary. 
Although documentaries follow a narrative structure similar to 
films, they can be seen as more closely related to news than 
films, especially regarding their intrinsic informative nature. In 
spite of their different natures, however, summaries created by 
humans produce similar scores for all of them. It is possible 
to observe this behavior in Figure Note that documentaries 
achieve higher scores than news articles or films, when using 
the original subtitles documents against the corresponding man¬ 
ual plot summaries. 

0.90 



News Articles Documentaries Films (Plot Films (Plot 

Summaries) Synopses) 

■ ROUGE-1 ■ ROUGE-2 ■ ROUGE-SU4 


Fig. 3: ROUGE scores for news articles, films, and documen¬ 
taries against manual references, plot summaries and synopses, 
and plot summaries, respectively. 

Eigure presents an overview of the performance of each 
summarization algorithm across all domains. The results con¬ 
cerning news articles were the best out of all three datasets for 
all experiments. However, summaries for this dataset preserve, 
approximately, 31% of the original articles, in terms of sen¬ 
tences, which is significantly higher than for films and docu¬ 
mentaries (which preserve less than 1%), necessarily leading 
to higher scores. Nonetheless, we can observe the differences 
in behavior between these domains. Notably, documentaries 
achieve the best results for plot summaries, in comparison with 
films, using scripts, subtitles, or the combination of both. The 
relative scores on the films dataset are infiuenced by two ma¬ 
jor aspects: the short sentences found in the films dialogs; and, 
since the generated summaries are extracts from subtitles and 
scripts, they are not able to represent the film as a whole, in 
contrast with what happens with plot summaries or synopses. 
Additionally, the experiments conducted for script-Fsubtitles for 
films, in general, do not improve scores above those of scripts 
alone, except for Support Sets for R-l. Overall, LSA performed 
consistently better for news articles and documentaries. Similar 
relatively good behavior had already been observed for meeting 
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recordings, where the best summarizer was also LSA 1^ . One 
possible reason for these results is that LSA tries to capture the 
relation between words in sentences. By inferring contextual 
usage of text based on these relations, high scores, apart from 
R-1, are produced for R-2 and R-SU4. For films, LexRank was 
the best performing algorithm for subtitles, scripts and the com¬ 
bination of both, using plot synopses, followed by LSA and 
Support Sets for plot summaries. MMR has the lowest scores 
for all metrics and all datasets. We observed that sentences 
closer to the centroid typically contain very few words, thus 
leading to shorter summaries and the corresponding low scores. 

Interestingly, by observing the average of R-1, R-2, and R- 
SU4, it is possible to notice that it follows very closely the val¬ 
ues of R-SU4. These results suggest that R-SU4 adequately 
refiects the scores of both R-1 and R-2, capturing the concepts 
derived from both unigrams and bigrams. 

Overall, considering plot summaries, documentaries 
achieved higher results in comparison with films. However, in 
general, the highest score for these two domains is achieved 
using films scripts against plot synopses. Note that synopses 
have a significant difference in terms of sentences in com¬ 
parison with plot summaries. The average synopsis has 120 
sentences, while plot summaries have, on average, 5 sentences 
for films, and 4 for documentaries. This gives synopses a clear 
advantage in terms of ROUGE (recall-based) scores, due to the 
high count of words. 


6. Conclusions and Future Work 

We analyzed the impact of the six summarization algorithms 
on three datasets. The newspaper articles dataset was used as a 
reference. The other two datasets, consisting of films and docu¬ 
mentaries, were evaluated against plot summaries, for films and 
documentaries, and synopses, for films. Despite the different 
nature of these domains, the abstractive summaries created by 
humans, used for evaluation, share similar scores across met¬ 
rics. 

The best performing algorithms are LSA, for news and doc¬ 
umentaries, and LexRank for films. Moreover, we conducted 
experiments combining scripts and subtitles for films, in order 
to assess the performance of generic algorithms by inclusion of 
redundant content. Our results suggest that this combination 
is unfavorable. Additionally, it is possible to observe that all 
algorithms behave similarly for both subtitles and scripts. As 
previously mentioned, the average of the scores follows closely 
the values of R-SU4, suggesting that R-SU4 is able to capture 
concepts derived from both unigrams and bigrams. 

We plan to use subtitles as a starting point to perform video 
summaries of films and documentaries. For films, the results 
from our experiments using plot summaries show that the sum¬ 
marization of scripts only marginally improved performance, 
in comparison with subtitles. This suggests that subtitles are 
a viable approach for text-driven film and documentary sum¬ 
marization. This positive aspect is compounded by their being 
broadly available, as opposed to scripts. 
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